"""
Strategies submitted to Axelrod's first tournament. All strategies in this
module are prefixed by `FirstBy` to indicate that they were submitted in
Axelrod's First tournament by the given author.

Note that these strategies are implemented from the descriptions presented
in:

Axelrod, R. (1980). Effective Choice in the Prisoner’s Dilemma.
Journal of Conflict Resolution, 24(1), 3–25.

These descriptions are not always clear and/or precise and when assumptions have
been made they are explained in the strategy docstrings.
"""

from typing import Dict, List, Optional, Tuple

from scipy.stats import chisquare

from axelrod.action import Action

from axelrod.player import Player

from axelrod.strategy_transformers import FinalTransformer

from .memoryone import MemoryOnePlayer

C, D = Action.C, Action.D

class FirstByDowning(Player):
    """
    Submitted to Axelrod's first tournament by Downing

    The description written in [Axelrod1980]_ is:

    > "This rule selects its choice to maximize its own longterm expected payoff on
    > the assumption that the other rule cooperates with a fixed probability which
    > depends only on whether the other player cooperated or defected on the previous
    > move. These two probabilities estimates are continuously updated as the game
    > progresses. Initially, they are both assumed to be .5, which amounts to the
    > pessimistic assumption that the other player is not responsive. This rule is
    > based on an outcome maximization interpretation of human performances proposed
    > by Downing (1975)."

    The Downing (1975) paper is "The Prisoner's Dilemma Game as a
    Problem-Solving Phenomenon" [Downing1975]_ and this is used to implement the
    strategy.

    There are a number of specific points in this paper, on page 371:

    > "[...] In these strategies, O's [the opponent's] response on trial N is in
    some way dependent or contingent on S's [the subject's] response on trial N-
    1. All varieties of these lag-one matching strategies can be defined by two
    parameters: the conditional probability that O will choose C following C by
    S, P(C_o | C_s) and the conditional probability that O will choose C
    following D by S, P(C_o, D_s)."

    Throughout the paper the strategy (S) assumes that the opponent (O) is
    playing a reactive strategy defined by these two conditional probabilities.

    The strategy aims to maximise the long run utility against such a strategy
    and the mechanism for this is described in Appendix A (more on this later).

    One final point from the main text is, on page 372:

    > "For the various lag-one matching strategies of O, the maximizing
    strategies of S will be 100% C, or 100% D, or for some strategies all S
    strategies will be functionally equivalent."

    This implies that the strategy S will either always cooperate or always
    defect (or be indifferent) dependent on the opponent's defining
    probabilities.

    To understand the particular mechanism that describes the strategy S, we
    refer to Appendix A of the paper on page 389.

    The stated goal of the strategy is to maximize (using the notation of the
    paper):

        EV_TOT = #CC(EV_CC) + #CD(EV_CD) + #DC(EV_DC) + #DD(EV_DD)

    This differs from the more modern literature where #CC, #CD, #DC and #DD
    would imply that counts of both players playing C and C, or the first
    playing C and the second D etc...
    In this case the author uses an argument based on the sequence of plays by
    the player (S) so #CC denotes the number of times the player plays C twice
    in a row.

    On the second page of the appendix, figure 4 (page 390)
    identifies an expression for EV_TOT.
    A specific term is made to disappear in
    the case of T - R = P - S (which is not the case for the standard
    (R, P, S, T) = (3, 1, 0, 5)):

    > "Where (t - r) = (p - s), EV_TOT will be a function of alpha, beta, t, r,
    p, s and N are known and V which is unknown.

    V is the total number of cooperations of the player S (this is noted earlier
    in the abstract) and as such the final expression (with only V as unknown)
    can be used to decide if V should indicate that S always cooperates or not.

    This final expression is used to show that EV_TOT is linear in the number of
    cooperations by the player thus justifying the fact that the player will
    always cooperate or defect.

    All of the above details are used to give the following interpretation of
    the strategy:

    1. On any given turn, the strategy will estimate alpha = P(C_o | C_s) and
    beta = P(C_o | D_s).
    2. The strategy will calculate the expected utility of always playing C OR
    always playing D against the estimated probabilities. This corresponds to:

        a. In the case of the player always cooperating:

           P_CC = alpha and P_CD = 1 - alpha

        b. In the case of the player always defecting:

           P_DC = beta and P_DD = 1 - beta


    Using this we have:

        E_C = alpha R + (1 - alpha) S
        E_D = beta T + (1 - beta) P

    Thus at every turn, the strategy will calculate those two values and
    cooperate if E_C > E_D and will defect if E_C < E_D.

    In the case of E_C = E_D, the player will alternate from their previous
    move. This is based on specific sentence from Axelrod's original paper:

    > "Under certain circumstances, DOWNING will even determine that the best
    > strategy is to alternate cooperation and defection."

    One final important point is the early game behaviour of the strategy. It
    has been noted that this strategy was implemented in a way that assumed that
    alpha and beta were both 1/2:

    > "Initially, they are both assumed to be .5, which amounts to the
    > pessimistic assumption that the other player is not responsive."

    Note that if alpha = beta = 1 / 2 then:

        E_C = alpha R + alpha S
        E_D = alpha T + alpha P

    And from the defining properties of the Prisoner's Dilemma (T > R > P > S)
    this gives: E_D > E_C.
    Thus, the player opens with a defection in the first two rounds. Note that
    from the Axelrod publications alone there is nothing to indicate defections
    on the first two rounds, although a defection in the opening round is clear.
    However there is a presentation available at
    http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf
    That clearly states that Downing defected in the first two rounds, thus this
    is assumed to be the behaviour. Interestingly, in future tournaments this
    strategy was revised to not defect on the opening two rounds.

    It is assumed that these first two rounds are used to create initial
    estimates of
    beta = P(C_o | D_s) and we will use the opening play of the player to
    estimate alpha = P(C_o | C_s).
    Thus we assume that the opponents first play is a response to a cooperation
    "before the match starts".

    So for example, if the plays are:

    [(D, C), (D, C)]

    Then the opponent's first cooperation counts as a cooperation in response to
    the non existent cooperation of round 0. The total number of cooperations in
    response to a cooperation is 1. We need to take in to account that extra
    phantom cooperation to estimate the probability alpha=P(C_o | C_s) as 1 / 1
    = 1.

    This is an assumption with no clear indication from the literature.

    --
    This strategy came 10th in Axelrod's original tournament.

    Names:

    - Downing: [Axelrod1980]_
    """

    name = "First by Downing"

    classifier = {
        "memory_depth": float("inf"),
        "stochastic": False,
        "long_run_time": False,
        "inspects_source": False,
        "manipulates_source": False,
        "manipulates_state": False,
    }

    def __init__(self) -> None:
        super().__init__()
        self.number_opponent_cooperations_in_response_to_C = 0
        self.number_opponent_cooperations_in_response_to_D = 0

    def strategy(self, opponent: Player) -> Action:
        """Actual strategy definition that determines player's action."""
        round_number = len(self.history) + 1

        if round_number == 1:
            return D
        if round_number == 2:
            if opponent.history[-1] == C:
                self.number_opponent_cooperations_in_response_to_C += 1
            return D

        if self.history[-2] == C and opponent.history[-1] == C:
            self.number_opponent_cooperations_in_response_to_C += 1
        if self.history[-2] == D and opponent.history[-1] == C:
            self.number_opponent_cooperations_in_response_to_D += 1

        # Adding 1 to cooperations for assumption that first opponent move
        # being a response to a cooperation. See docstring for more
        # information.
        alpha = self.number_opponent_cooperations_in_response_to_C / (
            self.cooperations + 1
        )
        # Adding 2 to defections on the assumption that the first two
        # moves are defections, which may not be true in a noisy match
        beta = self.number_opponent_cooperations_in_response_to_D / max(
            self.defections, 2
        )

        R, P, S, T = self.match_attributes["game"].RPST()
        expected_value_of_cooperating = alpha * R + (1 - alpha) * S
        expected_value_of_defecting = beta * T + (1 - beta) * P

        if expected_value_of_cooperating > expected_value_of_defecting:
            return C
        if expected_value_of_cooperating < expected_value_of_defecting:
            return D
        return self.history[-1].flip()